71 research outputs found
DRackSim: Simulator for Rack-scale Memory Disaggregation
Memory disaggregation has emerged as an alternative to traditional server
architecture in data centers. This paper introduces DRackSim, a simulation
infrastructure to model rack-scale hardware disaggregated memory. DRackSim
models multiple compute nodes, memory pools, and a rack-scale interconnect
similar to GenZ. An application-level simulation approach simulates an x86
out-of-order multi-core processor with a multi-level cache hierarchy at compute
nodes. A queue-based simulation is used to model a remote memory controller and
rack-level interconnect, which allows both cache-based and page-based access to
remote memory. DRackSim models a central memory manager to manage address space
at the memory pools. We integrate community-accepted DRAMSim2 to perform memory
simulation at local and remote memory using multiple DRAMSim2 instances. An
incremental approach is followed to validate the core and cache subsystem of
DRackSim with that of Gem5. We measure the performance of various HPC workloads
and show the performance impact for different nodes/pools configuration
Embedding Security into Ferroelectric FET Array via In-Situ Memory Operation
Non-volatile memories (NVMs) have the potential to reshape next-generation
memory systems because of their promising properties of near-zero leakage power
consumption, high density and non-volatility. However, NVMs also face critical
security threats that exploit the non-volatile property. Compared to volatile
memory, the capability of retaining data even after power down makes NVM more
vulnerable. Existing solutions to address the security issues of NVMs are
mainly based on Advanced Encryption Standard (AES), which incurs significant
performance and power overhead. In this paper, we propose a lightweight memory
encryption/decryption scheme by exploiting in-situ memory operations with
negligible overhead. To validate the feasibility of the encryption/decryption
scheme, device-level and array-level experiments are performed using
ferroelectric field effect transistor (FeFET) as an example NVM without loss of
generality. Besides, a comprehensive evaluation is performed on a 128x128 FeFET
AND-type memory array in terms of area, latency, power and throughput. Compared
with the AES-based scheme, our scheme shows around 22.6x/14.1x increase in
encryption/decryption throughput with negligible power penalty. Furthermore, we
evaluate the performance of our scheme over the AES-based scheme when deploying
different neural network workloads. Our scheme yields significant latency
reduction by 90% on average for encryption and decryption processes
FPGA ARCHITECTURE FOR 2D DISCRETE FOURIER TRANSFORM BASED ON 2D DECOMPOSITION FOR LARGE-SIZED DATA
ABSTRACT Applications based on Discrete Fourier Transforms (DFT) are extensively used in various areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (ID) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where ID DFTs are computed along the rows followed by ID DFTs along the columns. Both application specific and reconfigurable hardware have been used for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement the 2D DFT for largesized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex5 devices. For a 2K x 2K input size, the proposed architecture is 1.96x times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations
Powering Disturb-Free Reconfigurable Computing and Tunable Analog Electronics with Dual-Port Ferroelectric FET
Single-port ferroelectric FET (FeFET) that performs write and read operations
on the same electrical gate prevents its wide application in tunable analog
electronics and suffers from read disturb, especially to the high-threshold
voltage (VTH) state as the retention energy barrier is reduced by the applied
read bias. To address both issues, we propose to adopt a read disturb-free
dual-port FeFET where write is performed on the gate featuring a ferroelectric
layer and the read is done on a separate gate featuring a non-ferroelectric
dielectric. Combining the unique structure and the separate read gate, read
disturb is eliminated as the applied field is aligned with polarization in the
high-VTH state and thus improving its stability, while it is screened by the
channel inversion charge and exerts no negative impact on the low-VTH state
stability. Comprehensive theoretical and experimental validation have been
performed on fully-depleted silicon-on-insulator (FDSOI) FeFETs integrated on
22 nm platform, which intrinsically has dual ports with its buried oxide layer
acting as the non-ferroelectric dielectric. Novel applications that can exploit
the proposed dual-port FeFET are proposed and experimentally demonstrated for
the first time, including FPGA that harnesses its read disturb-free feature and
tunable analog electronics (e.g., frequency tunable ring oscillator in this
work) leveraging the separated write and read paths.Comment: 32 page
- …